home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Network Support Library
/
RoseWare - Network Support Library.iso
/
pressgen
/
token.err
< prev
next >
Wrap
Text File
|
1992-09-14
|
7KB
|
175 lines
This incident report is provided as supporting documentation for a message
left on NetWire on 9/9/92. The message describes some problems with the new
Token Ring drivers for 3.11, and omissions in the update documentation shipped
with the drivers.
The following document referrs to several servers by name. NN_server, and
DEVELOPMENT_server and NW 3.11 servers. IBM PS/2 model 95s w/ 16MB RAM.
These servers reside on a Token Ring network along with two other NW 2.2
servers.
The Token Ring is monitored by Proteon's network monitoring system. The
Proteon system reports media aquisition (MAC) layer errors.
Please pardon me if the document rambles a bit, but these reports are used for
future problem determination and evaluation. Our support group feels that
every idea committed to paper helps.
Larry Rubanka,
73465.643
INCIDENT REPORT:
NN_server, running Netware 3.11 using TOKEN.LAN v3.13 uninterrupted up time,
100+ days.
Installing the Netware for SAA v1.2 placed the new TOKEN.LAN v3.16 on
NN_server. This new LAN driver was loaded and running for several days.
SAA NLM was not running, or loaded.
Two similar, probably identical, events occurred within two weeks following
the SAA installation. No similar events had occurred before the SAA
installation.
SYMPTOMS OBSERVED:
1) A number of Receive Congestion (RC) errors reported by the Proteon network
monitor. These RC errors were on the ring used by four different servers and
over 100 node.
2) Token Ring was passing data, and workstations could log into and use other
servers.
3) No errors were reported on any servers, including NN_server.
4) SLIST did not show NN_server yet still showed other servers.
5) Performance, at workstations not using NN_server, was not noticeable
affected.
6) SNA gateway was in use and functioning.
7) Workstations attached to and accessing NN_server encountered errors.
8) Packet Receive Buffers on NN_server climbed continuously, and server
utilization showed 25%. This is high compared to our normal 5-10% level.
9) Another 3.11-based server, DEVELOPMENT_server, was running TOKEN.LAN v3.13
and was functioning properly. Packet Receive Buffers and percent utilization
of DEVELOPMENT_server were normal.
SOLUTIONS TRIED:
1) Excluded (disallowed access to the ring) any workstations reporting MAC
layer Token Ring errors. No effect.
2) Restarted several workstations that were reporting MAC layer errors. No
effect.
3) Eliminated all "Receive Congestion" errors from network by restarting any
workstations reporting errors.
4) Shut down NN_server, and restarted it. The problems recurred within ten
minutes, after other RC errors.
5a) Unloaded TOKEN.LAN v3.16. Stopped Packet Receive Buffer growth. Percent
utilization dropped to zero.
5b) Loaded older TOKEN.LAN v3.13. Resumed server operation without errors.
Logged in and out, used SLIST and SNA services. Everything was functioning
properly at this point.
5c) Shut down and restarted NN_server to reallocate RAM appropriately (can not
deallocated Packet Receive Buffers). No problems since.
CONCLUSIONS:
1) The problem seems local to the NN_server. DEVELOPMENT_server, similarly
configured, showed no problems, nor did other 2.2-based servers.
2) A significant difference between DEVELOPMENT_server, and NN_server is the
TOKEN.LAN version. These servers also use different disk drivers.
3) Receive Congestion errors are rare on our network. In both cases, the
incident was preceeded by Receive Congestion errors. Cause or effect? Could
the Receive Congestion errors have "triggered" the problem with v3.16
TOKEN.LAN, or were they merely a symptom of the problem? DEVELOPMENT_server,
running TOKEN.LAN v3.13, did not have any problems dealing with these errors.
4) In both incidents, the RC errors occurred when a PC experienced an error
under QEMM 386. I speculate that QEMM has paused the CPU and that the receive
buffer on the TR card is not being serviced. The card continues to function,
however, and reports the RC errors.
5) Due to time and availability constraints, unloading and reloading the
TOKEN.LAN v3.16 was not tried. However, TOKEN.LAN was stopped and
restarted via the NN_server shut down and restart. Restarting the server, and
hence the driver, did not eliminate the problems.
6) The TOKEN.LAN v3.16 seems to be problem.
NOTES:
The SAA installation introduced a new version of NMAGENT.NLM. This is not
used by other servers.
SAA installation was not completed due to errors encountered with
configuration values.
The documentation for TOKENDMA.LAN describes a condition where the driver
pauses execution until beaconing stops. This pause accounts for the queuing
of ECBs. I believe that the driver's behavior differs significantly from that
which is described in this documentation.
After adopting the TOKEN.LAN version 3.16, shipped with SAA, I recognized a
problem similar to that described in the update docs.
I am making some assumptions to help me work around the omissions in the
update documents.
I assume the behavior of v3.16 TOKEN is similar to v3.16 TOKENDMA with regards
to it's reaction to "Beaconing" errors. I am also assuming that the queueing
of ECBs in the send queue would drive up the number of Packet Receive Buffers
alocated and/or the Permanent Pool memory.
Given these assumptions, I have observed the following behavior of the
TOKEN.LAN v3.16 driver.
The driver appears to pause when a "RECEIVE CONGESTION ERROR" (RC error)
occurrs. The driver does not appear to resume when the RC error is cleared.
Several minutes (20 or more) after the node(s) reporting RC errors are
removed, the server still is unavailable.
In the presence of RC errors, the 3.16 TOKEN.LAN driver appears to pause. The
Permanent memory pool continues to grow. Packet Receive Buffers (PRBs) grow.
I have not waited for the PRBs to grow up to Max PRB.
While the PRBs and Permanent memory grow, the server continues operating.
Only Token Ring activity has paused. MONITOR continues to function as well as
do other NLMs.
If a connection is specified for clearing via MONITOR while the driver is
paused, monitor pauses. With monitor paused, the server is STILL functioning.
If MONITOR is unloaded from the console, the console pauses. I would expect
this behavior if the driver were paused. I presume that if the driver
resumed, the clear connection would proceed, and the unload would continue.
Clearing the RC errors does not cause the driver to resume, at least not
quickly. I waited approximately twenty minutes after eliminating all MAC
layer errors, but the driver remained paused.
The RC error is reported when a Token Ring interface's receive buffer is not
being serviced, and overflows. This is a node error, not a ring error. Token
Ring traffic continues in the presence of RC errors. To the best of my
knowlege, this is not "Beaconing."
While the RC errors persist, the Token Ring continues to pass traffic. Other
3.11 servers running v3.13 TOKEN.LAN continue normaly, as do 2.2 servers.